SDA 4.1 Documentation for COMPUTE
NAME
compute - compute a new variable
USAGE
compute -b filename
DESCRIPTION
COMPUTE creates a new SDA variable by performing calculations on
existing variables or by generating random distributions. Users
who run this program interactively should see the
online help document.
The user does not have to specify the number of decimals to store
in the new variable. If the `decimals=' keyword is missing, the
new variable is stored with as many decimals as necessary.
To run this program in batch mode it is necessary to prepare a
command file, which specifies how the new variable is to be
created and the options to use. The name of this batch command
file is specified to the program after the `-b' option flag.
This document explains how to prepare such a file.
CONTENTS
BATCH FILE LAYOUT
The batch file is laid out in separate parts. The algebraic
expression that defines the new variable must be the first part,
followed by an asterisk (*) on a line by itself. The elements of
the other parts can be given in any order. The category labels
and the descriptive text can have varying numbers of lines, and
each of those parts ends with an asterisk (*) on a line by
itself.
The general layout is as follows:
(Algebraic expression)
*
(Input and output specifications)
CATEGORIES= [optional]
(Category text and labels)
*
TEXT= [optional]
(Descriptive text for the new variable)
*
A line with two asterisks (**) on a line by itself can be added,
in order to separate sets of COMPUTE commands in the same batch
file. This is how you include
multiple computes
in the same file.
THE EXPRESSION
The name and the numeric values of the new variable are defined
by an algebraic expression. It is important to understand that a
new variable is defined by appearing on the left side of an equal
sign, and a new variable can appear only once in that position
(except in `if' statements).
Note the
rules
for variable names in SDA.
A complete list of operators and functions that can be used in
expressions is given later in this document. A few examples of
expressions are given next in this document. However, many more
examples can be found in the section on 'expressions' in the
online help document.
Simple expressions on one line
newvar = var1 + var2
newvar = sqrt(var1)
newvar = mean.2(var1,var2,var3)
if (var1 eq 1) newvar = var2
Expressions with If / Else if / Else
if (var1 eq 1)
newvar = var3
else if (var1 eq 2) [A space after `else' is optional]
newvar = var4
else
newvar = -1
endif
The `ELSE IF' part can be repeated; `ELSE' can be used only once;
both parts are optional.
The words `IF', `ELSE IF', `ELSE', and `ENDIF' should begin on a
new line. Note that they can be either in upper or in lower
case.
If no `ELSE' part is used, it is possible that some cases will
not meet any of the conditions; the new variable will then be set
to the specified missing data code for those cases.
There is an implied `ENDIF' at the end of the entire expression.
Therefore, the use of `ENDIF' is optional unless there are nested
IF-statements.
Expressions with Temporary Variables
Complicated expressions can be specified in steps using temporary
variables -- variables with names that begin with `$'. These
variables only exist while COMPUTE is running.
Each expression using a temporary variable must be on a separate
line, before the final line that gives the name of the new
variable to be saved.
Temporary variables can only be used in assignment statements.
They cannot be used in the test portion of an IF-statement.
The following is an example of the use of temporary variables.
$temp1 = var1 + var2
$temp2 = var1 * var2
newvar = $temp1 / $temp2
OPERATORS USED IN THE EXPRESSION
Arithmetic operators
- + - * /
- Addition, subtraction, multiplication, division
- ^
- Power
for example: newvar = var1^2
(`newvar' is the square of `var1')
- -
-
Unary `-' (negative of a variable or expression)
for example: newvar = -var1
(`newvar' is the opposite sign of `var1')
- ()
-
Parentheses are used to alter (or clarify) the usual
order of evaluation.
Precedence of operators: functions, unary -, ^, * and /, +
and - then left to right within level.
Logical operators to use with If / Else if
The arguments `x' or `y' stand for either an existing SDA
variable, a constant, or another expression.
Operator / Meaning / Example
- EQ
-
equal to
if (x eq y) newvar = 1
- NE
-
not equal to
if (x ne y) newvar = 1
- GT
-
greater than
if (x gt y) newvar = 1
- GE
-
greater or equal
if (x ge y) newvar = 1
- LT
-
less than
if (x lt y) newvar = 1
- LE
-
less or equal
if (x le y) newvar = 1
- AND
-
logical AND
if (x lt 2 AND y lt 2) newvar = 1
- OR
-
logical OR
if (x lt 2 OR y lt 2) newvar = 1
These operators can be in upper or lower case.
FUNCTIONS USED IN THE EXPRESSION
The functions listed below are recognized in expressions by the
COMPUTE program. The name of each function can be given in
either upper or lower case.
The arguments `a' or `b' stand for a specific constant (2 or 4.5,
for example). The arguments `x' or `y' stand for either an
existing SDA variable, a temporary variable, a constant, or
another expression.
Arithmetic Functions
- ABS(x)
- Absolute value
- EXP(x)
- Exponential function (antilog), e^x
- LOG(x) or LN(x)
- Natural logarithm
- LOG10(x) or LG10(x)
- Logarithm - base 10
- MOD(x,a)
- Modulus (remainder) of `x' divided by `a' (e.g., mod(5,2)
equals 1)
- ROUND(x) or RND(x)
- Round off (e.g., round(2.5) equals 3)
- SQRT(x)
- Square root
- TRUNC(x)
- Truncate (e.g., trunc(2.5) equals 2)
Summaries of Variables
- MEAN.n (x,y,...)
- Mean of the given variables
- SUM.n (x,y,...)
- Sum of the given variables
- MIN.n (x,y,...)
- Minimum value of the given variables
- MAX.n (x,y,...)
- Maximum value of the given variables
-
- Note: the `.n' part of the function name is optional. If
used, it tells the function that at least `n' of the given
variables must have valid data for a case; otherwise the function
returns the missing-data code. The default value for `n' is 1.
For example, `mean(var1,var2,var3)' will generate the mean of the
three variables, even if only one of the three has a valid code.
On the other hand `mean.2(var1,var2,var3)' will generate a mean
for a specific case only if at least two of the variables have
valid codes.
Other Summaries
- COUNT(x,y(a-b))
- Number of variables with values between `a' and `b'. (You
can specify a different value or a different range for each
variable; for example:
count(var1(1), var2(1-3), var3, var4(5-7)).
In the above example, the range `5-7' applies to var3 as well as
to var4; the last variable in the list MUST have a specified
value or range. Missing-data or out-of-range codes are not
counted unless the keyword `missing=valid' has been specified.)
- CUM(x)
- Cumulate the value of `x' from one case to the next. (The
first case is just the value of `x' for that case; subsequent
cases keep adding the value of `x'. If `x' for a case is a
missing-data code, and if the keyword `missing=valid' has NOT
been specified, the cumulative value is the same as for the
preceding case; cumulation resumes with the next case.)
- MISSING (x,y,...)
- Number of variables with missing-data or out-of-range
values.
Random Distribution Functions
- UNIFORM(x,y)
- Uniform distribution between `x' and `y'
- DUNIFORM(x,y)
- Discrete uniform distribution between `x' and `y'.
(The result is a whole number.)
- NORMAL(x,y)
- Normal distribution with mean `x' and standard deviation `y'
Trigonometric Functions
- SIN(x), COS (x)
- Sine and cosine (`x' is in radians)
- ARCSIN(y) or ARSIN(y)
- Arcsine ('y' is a sine between +1 and -1; result is in
radians)
- ARCTAN(z) or ARTAN(z)
- Arctangent ('z' is a tangent > 0; result is in radians)
KEYWORDS FOR COMPUTE SPECIFICATIONS
The specifications other than the expression are given in the
form "keyword = something" with one keyword per line. Keywords
may be given in any order, either in upper or in lower case. The
valid keywords are as follows (with significant characters shown
in capital letters):
Keywords Defining Input Variables and Computations (all optional)
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
STudies= path of source dataset(s) Look for input variables
only in current directory
MISSing= Valid Exclude input missing data,
or out of range values
SEED= seed for random numbers Use system clock and
process ID.
Keywords Defining the New Variable (all are optional)
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
OUTSTudy= path of study for new variable Current directory
DECimals= maximum number of decimals Indefinite number of
to store decimal places
CATlabels= (precedes lines of category No category text
labels - see details below) or labels
LABEL= long label for new variable No long label
OVERwrite= Yes Do not overwrite new var
if it already exists
TEXT= (precedes lines of descriptive No descriptive text
text -- see details below)
MD= list of invalid codes, ranges No defined MD codes
(also used for output value
if input vars have MD
-- see below)
MIN= minimum valid value No defined minimum
MAX= maximum valid value No defined maximum
Other options
Keyword Possible Specification Default (if no keyword)
_____________________________________________________________________
DIAGnostics= yes No diagnostic summary of
the new variable
COLorcoding= yes No colored headings in the
diagnostic output
GVARCase= LOWER or UPPER Do not convert all variable
names to lower/upper case
LAnguagefile= Name of file with non-English English labels on
labels and messages output
SAVebatch= name of directory No file preserved with batch
commands to create new var
(for interactive version)
The batch file name is the
name of the new variable,
with the suffix '.cmp'
NOTES ON THE KEYWORDS
ABBREVIATIONS AND REPETITIONS
Most keywords can be abbreviated. Usually only two or three
characters are required. Either upper or lower case may be used.
The keyword for the category text for the new variable, for
instance, can be given as `catlabels=' or `CATegories=' or
`cat='. If keywords are repeated, the second specification will
override the first.
COMMENTS
Anything on a line beginning with "#" is ignored by the batch
processor and can therefore be used for comments. Blank lines
are also ignored, except as a part of descriptive text.
CATEGORY TEXT AND LABELS
Category text and labels for one or more codes of the new
variable can be supplied. First put the `CATlabels=' keyword on
a line by itself; then specify on a separate line each code,
followed by one or more spaces or tabs, then the category text
[and short label, if desired]. (Programs such as TABLES and
MEANS will use the short label for a category, if one is
available.) Put an asterisk (*) on a line by itself after the
last label. For example:
catlabels=
0 Lowest value [Low]
5 Medium
10 Highest value [High]
*
DESCRIPTIVE TEXT
Descriptive text may be stored with the new variable. This text
can then be displayed when the variable is used in analysis
programs or in a codebook. First put the `TEXT=' keyword on a
line by itself; then write as many lines of text as you wish to
store with the new variable. Put an asterisk (*) on a line by
itself after the last line of text.
MISSING DATA ON THE NEW VARIABLE
If the value of the new variable cannot be computed for a case
(usually because the input variables have missing data or because
there is no `else' after an `if' statement), the output variable
will take on one of two values, depending on the options that
were specified:
- If the `MD=' keyword was specified, the case will be
assigned the value specified with the `MD=' keyword. If more
than one MD value was specified, the first MD value is used for
this purpose. Note that all values mentioned after the `MD='
keyword are flagged as missing-data in the new variable.
- If the `MD=' keyword has not been specified, the case will
be assigned the system missing-data value.
MULTIPLE COMPUTES
COMPUTE commands to create more than one variable can be included
in the same batch file. After the first set of commands
(expression plus other specifications), put a line beginning with
two asterisks (**); then the commands for another new variable
can follow. The values of the `STudies=' and `OUTSTudy='
keywords are carried over from the previous set of commands,
unless they are respecified.
BACKWARD COMPATIBILITY
COMPUTE can read most older CSA compute commands. The following
keywords are still recognized and are equivalent to the new
keywords shown in parentheses:
- longlabel (label)
- labels (catlabels),
- scale (ignored)
EXAMPLES OF BATCH FILES
Basic example
Basic example: compute the sum of 2 variables
(with each variable coded 1-5)
newvar = spend + spend2
*
study = c:\archive\nes96 (PC syntax)
study = /archive/nes96 (UNIX syntax)
label=Sum of spend and spend2
catlabels=
2 Lowest
6 Medium
10 Highest
*
Multiple complex computes in the same file
(with each set of compute commands separated by `**')
# 1. Count the number of occurences of `1'
newvar1 = count(spend, spend2, spend3, spend4 (1) )
*
study = /sda/demostudy
label=Number of `spend too much' in spend - spend4
overwrite=yes
text =
This variable counts the number of times that the code '1'
(for govt. spends too much on this) is recorded.
*
**
# 2. Create a second new variable in this run.
# Compute the mean of 3 variables; at least 2 must have valid
codes.
newvar2 = mean.2(spend, spend2, spend3)
*
label=Average of spend, spend2, and spend3
md=9
**
# 3. Also create a random variable with a normal distribution
# random variable will have mean=0, standard deviation=10.
newvar3=normal(0,10)
*
label=Random numbers with mean 0,sd 10
seed= 12121
CSM, UC Berkeley/ISA
July 6, 2021